Skip to content

amnesia-ab CONFIRMED + Ollama backend adapter + v0.1.1 prep#67

Merged
OpenCircuitDev merged 7 commits into
mainfrom
feat/bench-coverage-dashboard-trends
Jun 11, 2026
Merged

amnesia-ab CONFIRMED + Ollama backend adapter + v0.1.1 prep#67
OpenCircuitDev merged 7 commits into
mainfrom
feat/bench-coverage-dashboard-trends

Conversation

@OpenCircuitDev

Copy link
Copy Markdown
Owner

What's in here (5 commits on top of main)

  1. Dropbox→Git migration snapshot (e4613de, pre-existing on this branch)
  2. bench/isolation/memory/amnesia-ab — first memory sandbox to RUN, verdict CONFIRMED (87c19c8)
    • memory-ON fact recall 94.2% (confirm ≥70) · retrieval hit rate 100% (confirm ≥80) · memory-OFF sanity 2.5% (must be ≤25)
    • llama3 8B Q4 + mxbai-embed-large, 62-memory corpus w/ cross-project distractors, 20 tasks, objective key-fact scoring
    • OFF-arm failure mode is confident fabrication, not ignorance — the memory loop is the product, demonstrated
  3. Ollama backend adapter in ocm-inference (ad2162a) — native NDJSON /api/chat, health via /api/tags, max_tokens→num_predict; parser tests pinned to verbatim live-daemon captures. Selector untouched (daemon settings wiring is the follow-up).
  4. README correction (1121e21) — registry is 3/3 SHA256-verified since chore: drop unhashed Qwen3 entries; spec blockers cleared #50; the '5 GGUFs / open hash blocker' claims were stale.
  5. v0.1.1 draft release notes (5c60cb5) — docs/release-notes/v0.1.1-draft.md.

Verification

  • amnesia-ab: full prompt/output/score rows in bench/isolation/memory/amnesia-ab/results/run-2026-06-11T20-32-21.json
  • Rust changes: CI matrix (fmt + clippy + test × ubuntu/macos/windows) on this branch — run 27376070667

🤖 Generated with Claude Code

Brand and others added 7 commits May 30, 2026 19:46
The cheapest discriminating test of the central loop (spec row 9, library-
driven retrieval) as a faithful miniature: mxbai-embed-large cosine top-5 ->
inject -> llama3 8B Q4 via Ollama, 62-memory corpus w/ cross-project
distractors, 20 tasks, objective key-fact scoring.

Measured (results/run-2026-06-11T20-32-21.json):
  memory_on_fact_recall_pct  94.2  (confirm >=70)
  retrieval_hit_rate_pct    100.0  (confirm >=80)
  memory_off_fact_recall_pct  2.5  (sanity <=25 — corpus not guessable)
  latency p50 on/off         19.5s / 12.3s

OFF-arm failure mode is confident fabrication, not ignorance — the memory
loop is the difference between correct specifics and plausible lies on an
8B model. Per the decision rule this justifies: the Ollama backend adapter,
activating mem0-v3-locomo, and cutting v0.1.0.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… Ollama users

Third InferenceBackend: bridges OCM to an existing Ollama daemon via its
native NDJSON /api/chat API (model tag required per-request; max_tokens
maps to options.num_predict; health via /api/tags). Selector untouched —
explicit construction for now; daemon settings wiring is the follow-up.

Parser test fixtures are VERBATIM captures from a live Ollama daemon
(llama3, 2026-06-11) — pinned to the real wire format. Motivated by the
amnesia-ab sandbox CONFIRMED verdict (94.2% fact recall on this exact
daemon + model class).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… blocker resolved

The 'five model SHA256 hashes' pre-release blocker was cleared when the
unhashed Qwen3 entries were dropped (#50); the shipping registry is 3
models, all hashed. README still claimed 5 GGUFs + an open blocker.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ose + bench.py

The dry-run validator requires docker-compose.yml + bench.py for ACTIVE
sandboxes (caught by Bench Framework CI on PR #67 — the framework doing
its job). bench.py delegates to run.mjs (ONE harness, the exact artifact
that produced the CONFIRMED result); compose runs it in node:22-slim
against the HOST Ollama daemon via host-gateway, same host-dependency
pattern as vllm-q4-llama8b. run.mjs now honors OLLAMA_URL.

Local validate_compose: PASS.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…row 9

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@OpenCircuitDev OpenCircuitDev merged commit b65cfcb into main Jun 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant